Mining MEDLINE: Abstracts, Sentences, or Phrases?

نویسندگان

  • Jing Ding
  • Daniel Berleant
  • Dan Nettleton
  • Eve Syrkin Wurtele
چکیده

A growing body of works address automated mining of biochemical knowledge from digital repositories of scientific literature, such as MEDLINE. Some of these works use abstracts as the unit of text from which to extract facts. Others use sentences for this purpose, while still others use phrases. Here we compare abstracts, sentences, and phrases in MEDLINE using the standard information retrieval performance measures of recall, precision, and effectiveness, for the task of mining interactions among biochemical terms based on term co-occurrence. Results show statistically significant differences that can impact the choice of text unit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pacific Symposium on Biocomputing 7:326-337 (2002). MINING MEDLINE: ABSTRACTS, SENTENCES, OR PHRASES?

s within occurring B and A between ns interactio of # unit text of type a within occurring B and A between ns interactio of # recall = where A and B are query terms or their synonyms. Intuitively, recall here measures the capacity of a given text unit to contain the interactions present in MEDLINE abstracts. Any interaction described within a particular text unit is also described within all la...

متن کامل

Finding Cue Expressions for Knowledge Extraction from Scientific Text: Early Results

This paper investigates whether and how natural language processing and data mining techniques can be utilized for locating desired knowledge in a large text collection. This task amounts to finding cue words and phrases indicating the location of knowledge, where the challenge is to establish a methodology that can cope with the diversity of expressions. We examine the feasibility of mining cu...

متن کامل

Proceedings of the Pacific Knowledge Acquisition Workshop 2004

This paper investigates whether and how natural language processing and data mining techniques can be utilized for locating desired knowledge in a large text collection. This task amounts to finding cue words and phrases indicating the location of knowledge, where the challenge is to establish a methodology that can cope with the diversity of expressions. We examine the feasibility of mining cu...

متن کامل

Identifying Sections in Scientific Abstracts using Conditional Random Fields

OBJECTIVE: The prior knowledge about the rhetorical structure of scientific abstracts is useful for various text-mining tasks such as information extraction, information retrieval, and automatic summarization. This paper presents a novel approach to categorize sentences in scientific abstracts into four sections, objective, methods, results, and conclusions. METHOD: Formalizing the categorizati...

متن کامل

PathBinderH: a Tool for Sentence-Focused, Plant Taxonomy-Sensitive Access to the Biological Literature

Mining the biological “literaturome” promises significant advancements in genome annotation, literature access, curation support, and other applications. Standard tools allow users to identify scientific abstracts containing one or more query terms. In contrast, PathBinderH is a Webserved text mining tool that allows users to search PubMed (including MEDLINE) for sentences with co-occurring ter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2002